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Abstract 

We show that the Tangled Nature model can be interpreted as a general formulation of the 
quasi-species model by Eigen et al. in a frequency dependent fitness landscape. We present a 
detailed theoretical derivation of the mutation threshold, consistent with the simulation results, 
that provides a valuable insight into how the microscopic dynamics of the model determine the 
observed macroscopic phenomena published previously. The dynamics of the Tangled Nature 
model is defined on the microevolutionary time scale via reproduction, with heredity, variation, 
and natural selection. Each organism reproduces with a rate that is linked to the individuals' 
genetic sequence and depends on the composition of the population in genotype space. Thus the 
microevolutionary dynamics of the fitness landscape is regulated by, and regulates, the evolution 
of the species by means of the mutual interactions. At low mutation rate, the macro evolutionary 
pattern mimics the fossil data: periods of stasis, where the population is concentrated in a network 
of coexisting species, is interrupted by bursts of activity. As the mutation rate increases, the 
duration and the frequency of bursts increases. Eventually, when the mutation rate reaches a 
certain threshold, the population is spread evenly throughout the genotype space showing that 
natural selection only leads to multiple distinct species if adaptation is allowed time to cause 
fixation. 
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I. INTRODUCTION 



Explaining observed macro-evolutionary patterns as collective emergent properties of 
systems with many interacting degrees of freedom, whether these be single individuals or 
'species', is an alluring challenge for researchers with a background in statistical physics [|T], Q. 
The quasi-species model by Eigen et al. [^, ^ has proved useful when investigating the 
behaviour of populations in a given fixed fitness landscape and it provides a firm paradigm 
for many models 

The fundamental idea in the approach by Eigen et al. is to identify species with sequences 
in genotype space. Positions in genotype space which are assigned particularly high fitness 
are called wildtypes, that is, the forms that predominate in a population are well adapted to 
the environment. During the reproduction event, mutations are seen as errors of the replica- 
tion of the parental sequence. The effect is thus to spread the population from the original 
point to neighbouring positions in genotype space. If one were to use a classical Darwinian 
view on such a process, the population would then be sharply localised in genotype space 
on the position which corresponds to a high fitness: all other positions would be cancelled 
by their low fitness. This can only be true if the replication process is by and large accurate. 
A replication process with a too high mutation rate would produce copies of the original 
fit parent with so many errors that selection is unable to maintain the population at the 
original point. 

By lowering the mutation rate progressively, variation would be less effective in dispers- 
ing the population since the offspring are more similar to the parents. The quasi-species 
model defines the presence of a threshold, in the mutation rate, where the multiplication 
process changes drastically. The gradual decrease of the mutation rate sees the transition 
from a random population, diffused as scattered points in genotype space, to a population 
constrained to a few positions. 

The transition of the process from a random state to an ordered one is a phase transi- 
tion, with the mutation rate acting as a control parameter. The nature of the transition 
has been extensively studied. In the seminal paper by Eigen et al., where the transition was 
first noticed the species have a predetermined fixed fitness associated. Subsequently, 
the quasi-species model has been analysed in different fitness landscapes |^, for different 
topologies of the genotype space [§, and for spatially resolved models 0, each confirming 
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the original results. Finally, the error threshold in a model with a dynamical fitness land- 
scape has been analysed. In this case however, the dynamics is regulated artificially 
from outside. 

It has been shown that it is possible to map the original quasi-species model onto a 



two-dimensional Ising system with nearest-neighbour interaction in one direction [|TT|, and 
that, in this representation, for simple fitness landscapes, the correspondence links the error 
threshold with a first-order phase transition [Q. A relation of fundamental importance 



by Galluccio et al. [|1^ proves that the error-threshold naturally arises as a consequence of 
the model introduced and that, more generally, for a given mutation rate Pmut and a given 
reproduction rate Poff it is possible to determine uniquely an upper limit for the length of 
the genetic sequence. 



We show that the Tangled Nature model, introduced in detail in Refs. |14, 15], can 
be considered as a general formulation of the quasi-species model. The generalisation is 
provided by a relaxation on the condition of fixed population size, which, in the original 
formulation, acts as selection principle on the sequences. The most important features of 
the Tangled Nature model, for details see Refs. 15|, is that of creating multiple co- 
evolving quasi-species in a frequency dependent fitness landscape, where the dynamics of 
the landscape is an inherent property of the model. In this paper, we present in detail the 
theoretical calculation of the mutation threshold which fits the experimental accurately []T3| . 

It is also interesting to point out the connection between the Tangled Nature model and 
game theoretical non-linear replicator dynamics |T^. In both cases the reproduction of a 
given type of individuals depends on the configuration of the entire population. One therefore 
expects to find stable solutions to the dynamics of the Tangled Nature model similar to the 
Nash equilibria or Evolutionary Stable Strategies found for replicator dynamics. We have 
stressed this relation by using the term quasi- Evolutionary Stable Strategies to denote the 
quasi-stable configurations of the Tangled Nature model. 

In Sec. H we review briefiy the quasi-species model by Eigen et al. Section |T| briefiy dis- 
cusses the definition of the Tangled Nature model with an intrinsically generated dynamic 
fitness landscape. We discuss in detail the dynamics of the model in terms of difference 
equations. Section |IV| contains a discussion of the error threshold theoretically and numer- 
ically and finally in Sec. [V| we discuss the relation between the Tangled Nature model and 
the quasi-species model by Eigen et al. 
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II. THE QUASI-SPECIES MODEL 



Eigen et al. Q introduced a model in which the effects of various mutation rates on 
a process of rephcation of finite sequences of binary values is explored. Each sequence 

= {S^, S2, . . . , SI}, where S'f = { — 1, +1}, i = 1, 2, . . . , L in genotype space represents 
a species. Each existing sequence S"^ replicates, with a constant rate p'^j^j = poff{S°') and 
degrades with a constant and universal (i.e., independent of position) rate Pkni- The number 
na{t) = n{S°'){t) of copies of a given sequence S° varies with time. The replication process is 
not exact but prone to error. During the replication, the rate of mutation per gene is Pmut- 

The model has been solved analytically in the limit where one particular sequence is 
assumed to have a high fitness, while mutants are less fit. For low mutation rate, the 
population is concentrated around the top of the mountain in the fitness landscape. The 
dominant sequence with its surrounding mutants is called a quasi- species. As the mutation 
rate increases, the population drifts away from the top down to the ridges. Eventually, when 
the mutation rate reaches a threshold value p^^ut^ the population is spread evenly throughout 
the fitness landscape, that is, a phase transition occurs at p^ut- 



III. THE DYNAMICS OF THE TANGLED NATURE MODEL 



The dynamics of the Tangled Nature model is defined via an elementary time step where 
(a) one organism is randomly selected and killed with constant probability p^m (b) one 
organism is randomly selected and with probability Poffi that depends on the current com- 
position of the population in genotype space, two offspring are reproduced and the parent 



is then removed from the ecology W% 15 



By analysing the dynamics it is possible to characterise the stable configurations that 

may develop in the Tangled Nature Model. 

The difference equation describing the variation of the number of individuals of a position 
during a single time step can be derived as follow. Let na{t) denote the number of 

individuals at position S". Then 

na{t + l) = na{t) + Y.Ana{E)-P{E), (1) 
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where t is the number of time steps, E refers to any event that can affect n^, that is, an 
annihilation event or reproduction event, by an amount Ana{E). The event E occurs with 
probabihty P{E). 

For a kilhng event, Ana{E) = — 1 and the probability of a killing event is the product 
of the probability of choosing an organism in position S'^ times the killing rate, that is, 
P{E) = Pa{t)pkiih where we have introduced the density paif) = Y^°'n\t) of organisms at 
position S". 



For a reproduction event, distinction has to be made between the case where reproduction 
originates from position S"^, see Fig. ^(a) and reproduction originating from any other 
position different from S", which we will call the "back-flow" contribution, see Fig. |^(b). 
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FIG. 1: Probabilities associated with a reproduction event. An organism at position S'^ is shown 
with an open circle and any other type of organism with a sohd circle. The columns labelled 
represents the three possible outcomes of a reproduction event; in the columns labelled by 
"Ana(S)" the variation of associated with event E is listed. The probabilities involved are 
given in the columns marked P{E), where po is the probability of no mutations during a repro- 
duction event and 1 — pq the probability of at least one mutation while p is defined in Eq.(P). 
(a) Reproduction originating from S". (b) Evaluation of the backflow associated with the events 
S / ^ S". 

The first case happens with probability P = pa{t)p1jj{t), that is, the probability of 
picking an organism of position S", times the fitness of S". In this event, Ua can decrease 
by one unit (An^ = —1), increase by one unit (An^ = -|-1), or remain constant (An^ = 0), 
with relative probabilities as calculated in Fig. 0(a). 

The probability of having i mutations during a single replication is 
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with Ef=oPi 



(2) 



From Fig. |^(a) we can deduce the net contribution to the population at position by 



5 



summing over all possible events: 

J2 An,{E)PiE) =pl-il~ po)' = 2po - 1. (3) 

E 

The "back flow" contribution occurs with probability 

(4) 

In this case, the variations and the probabilities involved are shown in Fig. |l](b). 
In order to mutate from to S", Ldah mutations are necessary, where 

d,, = d{^\^'>) = —Y,\St-S\\ (5) 

j=i 

so 

P = p"t(l-Pm«0^^'"'"'^ (6) 

is the probability of creating an organism in position originating from position S^. 

As Ldah mutations are needed, the probability involved in a back-flow contribution from 
position is, see Fig. |l|(b), 

/\na{E)P{E) = 2p(l -p) + 2p' = 2p = 2p^tt (1 - p^ut)"-^'-'^'^ . (7) 

E 

Thus, the full expression for the difference equation is, 

na{t + l) = na{t) - Pa{t)pkUl + Pa{t)Poff{t){2po - 1) 

+ 2j2Pb{t)plff{t)p'J:ni - Pmut)''^'''^'^ . (8) 

This is the equivalent of the quasi-species equation by Eigen et al. The main difference is 
that the rates of production depend on the current composition in population space. 
Summing Eq.(||) over all positions in genotype space we flnd, as expected, 

N{t + 1) = N{t) - pkui + (Poff). (9) 
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From the simulations we know that in the hmit of strong interactions among the individ- 



uals, the dynamics is intermittent |T^ , p!5[] . Extended periods are dominated by a network 
of few heavily occupied positions. These periods, called quasi-Evolutionary Stable Strate- 
gies (q-ESS), are interrupted by sharp bursts where the configuration of the species change 
rapidly and significantly. In order to describe the dynamics, we impose a stability condition 
on the difference equation: we require that within a single q-ESS, the average number of 
individuals remains constant. Moreover, the q-ESS states are dominated by some very fit 
positions surrounded by unfit neighbouring positions. Thus we can neglect the back-flow 
contribution in the difference equation, Eq.(|]), and obtain 

na{t + 1) = na{t) + pait) [pljf{t){2pQ - 1) - p^ai] ■ (10) 

Averaging over time, the equation becomes 

^ = ^ + PaPoffC^PO - 1) - P^Pkill- (11) 



Assuming that PaPofj = PaPojf^ fhe fitness for all positions in the set Sp^^.^.=p^: 

With Pmut = 0.008 we have po = {1 — Pmut)^ = 0.852 for L = 20; using pkui = 0.2, we find 
Pq = 0.284 consistent with the observation of Fig. |^. 
Neglecting the back flow is valid if all terms 

Pb{t)plff{t)ptt [I - Pmut]''^'"'^'^ = Pb{t)p''off{t)ptt [1 - L{1 - dat)Pmut + " " "] 

are small. Since pmut <^ 1, the leading term is Pb{t)Poff{t)Pmut'- This can be neglected if 
Ldab > 1- With Ldab = 1 it can be neglected since none of the nearest neighbours are fit, as 

Poffit) « 1- 
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FIG. 2: The probability density function of the weight function H = In (^j^^^yj^ during a q-ESS 
state of a simulation (solid line) and during a transition between 2 q-ESS states (dashed line). 
During a q-ESS state (solid line) positions range in two sets: unfit positions, for which the weight 
function is lower than —3.0 and fit positions, for which the fitness is greater then the average value 
{H) = In ( ^^P'''" J PS —1.38 = H^ectic: indicated by a vertical dotted line. During a transition 
(dashed line) the fitness of all positions is normally distributed around UhecUc where all positions 
reproduce (on average) at the same rate, equal to the killing rate. Notice the support of the weight 
function in the hectic phase exceeds Kq^ ensuring that positions in genotype space are able to 
fulfill the q-ESS balance Eq.(13). The parameters (for precise definitions, see Refs. |14, Il3]) are 



Pkiu = 0.2, fi = 1/1000 • In ( ) « 0.0014, C = 10.0 and Pmut = 0.008. 



IV. THE ERROR THRESHOLD 

The discussion of the q-ESS state was made with the imphcit assumption of the existence 
of q-ESS states. We will find here that we can estabhsh quahtative arguments that ensure 
the existence of the q-ESS states. 

We have seen that q-ESS states are possible only if the interactions are important in the 
weight function. Furthermore, the average fitness Pq of the fit positions in the q-ESS state 
is given by 

^ Pkill (13) 

2(1 — Pmut)^ '~ 1 

and thus is related to the mutation rate. This relation states that the fit positions are those 
that are able to counterbalance the killing by the production of offspring. 

Equation (|T3|) is the starting point for determining a necessary condition for the existence 



8 



of a q-ESS state. We have investigated the behaviour of the dynamics as a function of 
mutation rate. The results are illustrated in Fig. |^. 

For increasing Pmut, the duration of q-ESS states decreases. Above a threshold p^^^^ of 
the mutation rate Pmut, there are no more q-ESS states: the dynamics is completely hectic. 
For intermediate values of Pmut, the transitions between two q-ESS states are extended and 
the initial transient can be very long. 

This numerical result shows that the model defines an error threshold for the mutation 
rate above which no q-ESS state exists. 

From Eq.(|l^) we obtain for the weight function 

H, = ln( =\n( ^ 1 . (14) 

" \l-pj \2po-l-pkiiiJ ^ ' 

When the mutation rate is close to p^^uti most of the simulations are in hectic states, for 
which the fitness is equal to pkui and therefore we might assume that the weight function is 
equal to 

Hhectic = In ( . ) . (15) 

Stable q-ESS states can only develop from a hectic phase when some positions, due 
to fluctuations, acquire sufficient fitness to be consistent with the q-ESS balance given by 
Eq. (p!3D . That is, fiuctuations in the weight functions in the hectic phase must allow 

Hhectic + > Hq (16) 

where a G (0, 1) describes the width of the distribution of weight functions in the hectic 
phase, see Fig. ^ and C determines the width of the distribution of the possible coupling 
strengths between the individuals. Small C corresponds to the strong coupling regime while 
large C corresponds to the weak coupling limit. Using Eq.(|l^) and Eq.(^) we obtain 

In ( -^M!L\ + « > In f ^ ^ (17) 

\1-Pkiuj C \2po-l-pkiiiJ 

which, translated to the mutation rate Pmut, becomes 

"'/''{I-Pkud + I+Pkui^'^'^ 



Pmut < 1 



Pmut (18) 
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FIG. 3: Occupation plots for different values of the mutation rate. The y-axis refers to an arbitrary 
ennumeration of all positions in genotype space. Occupied positions are indicated by a black dot. 
Results shown are for Pkiii = 0.2, II = 1/1000 • In (j^^) and C = 0.05). (a) Mutation rate: 
Pmut = 0.009. The initial transient is extended, (b) Mutation rate: Pmut = 0.00925. The initial 
transient has the same extension of any q-ESS state, (c) Mutation rate: Pmut = 0.0095. The 
transition between two q-ESS state are extended, (d) Mutation rate: Pmut — 

0.01. The initial 

transient is very extended, (e) Mutation rate Pmut = 0.0104. The initial transient and any 
transitions are extensively hectic, (f) Mutation rate Pmut = 0.0108. There is no q-ESS state. 



10 



Eq.(|T8]) defines the functional dependency of the error threshold in terms of a, C and pkui- 
In Fig. I we use a as a fitting parameter and show p^^^ as a function of C. 

O.lr ' ' ' ' I 




0.001' ' ' ' — ' 

0.001 0.01 0.1 1 10 

c 

FIG. 4: The computational determination of the error threshold. The loss of q-ESS states occurs 
for mutation rates above the solid circles. The data, compared with the theoretically predicted 
error threshold p^^f (solid line), indicate a value of a = 0.07, see Eq.(|l8|). The parameters of the 
simulations are L = 20, ^ = 0.005 and pkui = 0.2. 

The error threshold has been determined numerically by iterating many simulations with 
increasing value of the mutation rate for a given C. When no q-ESS emerges, we have 
reached the error threshold; the lowest Pmut for which only a hectic states exists is the 
estimated value of p^^j. 

The numerical results confirm the theoretical predictions given by Eq . ([T8|) and, qualita- 
tively, are in line with the results of Eigen et al. p, Q . The transition in the Tangled Nature 
model appears to be sharp, that is, for values of Pmut greater than p^^^ q-ESS states are 
impossible, while for Pmut < V^mut Q-ESS are possible, see Fig |. 

Since the factor a represents the width of the distribution of the weight function during 
a hectic state it is linked to J = {Jab}-, the set of interactions, and also to /i. This makes it 
difficult to analytically determine a. 
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V. DISCUSSION 



In the Tangled Nature model the competition of the organisms is described by the mutual 
interactions, creating a dynamical rugged fitness landscape where the fitness of a position is 
determined by the temporal evolution of the ecology. The dynamics, illustrated in |T^ 
selects few heavily occupied positions in genotype space surrounded by other sequences in 
the immediate vicinity. The central positions are the only able to reproduce actively. They 
sustain themselves and all the surrounding ecology. This situation is possible only as long 
as the mutual interactions, are sufficient to counterbalance the dispersive action caused by 
mutations. 

Thus we have derived an interpretation of the Tangled Nature model as an evolutionary 
quasi-species model. In the Tangled Nature model however, the fitness landscape is not 
fixed. Due to the frequency dependent fitness landscape, the Tangled Nature model allows 
the emergence of multiple co-existing quasi-species during q-ESS states. Also, it should be 
noted, that in contrast to the model by Eigen et al. 0, the quasi-species in the Tangled 
Nature model are not absolute quantities but may change from one q-ESS to another. 

We have discussed and identified the error threshold in the Tangle Nature model as the 
mutation rate at which the model is unable to support, over extended periods in time, 
the occupation of well defined multiple co-existing genotypes. A formula for the parameter 
dependence of the error threshold was derived, see Eq.(|T8|). In particular, the error threshold 
depends on genome length as 1/ (for large L) which is consistent with the findings in the 
quasi-species models, see Refs. This result suggests that the mutation rate per base 

pair itself is subject to selection in a way to make the mutation per base pair decrease with 
increasing genome length. This is indeed observed in nature. 

We are extremely grateful to Matt Hall for very helpful discussions. K. C. gratefully 
acknowledges the financial support of U.K. EPSRC through Grant No. GR/R44683/01. 
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